Feature-specific nutrient management of onion (Allium cepa) using machine learning and compositional methods

While onion cultivars, irrigation and soil and crop management have been given much attention in Brazil to boost onion yields, nutrient management at field scale is still challenging due to large dosage uncertainty. Our objective was to develop an accurate feature-based fertilization model for onion crops. We assembled climatic, edaphic, and managerial features as well as tissue tests into a database of 1182 observations from multi-environment fertilizer trials conducted during 13 years in southern Brazil. The complexity of onion cropping systems was captured by machine learning (ML) methods. The RReliefF ranking algorithm showed that the split-N dosage and soil tests for micronutrients and S were the most relevant features to predict bulb yield. The decision-tree random forest and extreme gradient boosting models were accurate to predict bulb yield from the relevant predictors (R2 > 90%). As shown by the gain ratio, foliar nutrient standards for nutritionally balanced and high-yielding specimens producing > 50 Mg bulb ha−1 set apart by the ML classification models differed among cultivars. Cultivar × environment interactions support documenting local nutrient diagnosis. The split-N dosage was the most relevant controllable feature to run future universality tests set to assess models’ ability to generalize to growers’ fields.

www.nature.com/scientificreports/features.Nevertheless, well-documented trials can be assembled into large databases and decrypted using powerful tools of artificial intelligence to support wise decisions on site-specific fertilization.
The traditional objective of conducting fertilizer trials is to define critical and maintenance soil test levels to "feed the plant" (sufficiency levels of available nutrients) or to "feed the soil" (basic cation sufficiency ratios; nutrient buildup and maintenance) [14][15][16] .In Brazil, the concept to "feed the plant" for N fertilization involved the contribution of soil organic matter content to the nitrogen budget of the agroecosystem 17 .The concept to "feed the soil" for P and K fertilization relies, respectively, on clay content and soil test P, and on cation exchange capacity (CEC) and soil test K.The clay content is assumed to be related to the soil P fixing capacity controlling P-use efficiency 18 .The CEC implied that soil test K should be maintained at a 'High' soil test level despite high risks of K leaching.The CEC can be computed from exchangeable cations and exchangeable acidity.
While yield-impacting features interact in agroecosystems, testing myriads of interactions between fertilizer management and environmental and managerial features would be a gigantic task.Machine learning (ML) decision trees such as random forest and extreme gradient boosting are commonly used non-parametric dataprocessing methods that can address multivariate interacting effects in high-dimensional databases [19][20][21] .On the other hand, the classical tissue test interpretation has long been criticized for not considering nutrient interactions 22 .This is especially important for onions, a high S-demanding crop, where cross-talks between sulfur and cationic micronutrients as modulated by mycorrhizae 23 are common 24 .Nutrient interactions and cross-talks are generally represented by pairwise ratios 25,26 .The centered log ratio ( clr ) transformation is a multi-ratio that expands pairwise ratio by adjusting any nutrient level to the geometric mean across nutrients.The log-ratio transformation can control numerical biases caused by spurious correlations in the statistical analysis of compositional data 27 .The clr transformation thus allowed to compute means and variances unbiasedly.Nonetheless, decision-tree machine learning methods could handle nutrient interactions in onion tissue with no need for data transformation.
Fertilizer recommendations, above all nitrogen, have been puzzling for decades without agreement on which methodology is the best to balance environmental and economic outcomes 24 .We hypothesized that (1) a minimum dataset of features easy to document by stakeholders suffice to predict onion yield accurately using machine learning methods, and (2) tissue nutrient standards depend on cultivar × environment interactions.Our objective was to evaluate the capacity of machine learning models to predict onion yields and to derive tissue nutrient standards for onions under Brazilian conditions.

Model performance to predict bulb yields
Features to model bulb yields were cultivar, soil management, cropping system, previous crop, fertilization (N-P-K), soil test results (clay content, CEC, organic matter, pH, nutrients) and climatic variables (length of the growing season, rainfall, SDI, cumulated degree-days, date of crop establishment), as described in Table 6.As shown by the RReliefF scores (Fig. 1), the split-N dosage as well as soil micronutrient and S tests were the most relevant features in relation to bulb yield.Other features were weaker predictors.Soil test Fe reflects the presence of weatherable minerals (Fe) in Cambisols.Fertilizer S, B, Zn and Mn are applied at planting or sowing or as foliar sprays, hence accumulating in the soil.The Cu, Zn, and Mn applied as fungicides contribute to their accumulation in the soil.
Learners were similarly accurate (R 2 > 0.90) to predict marketable bulb yields using either all features or a minimum data set of the most relevant features (Table 1 and Fig. 2).Non-climatic features readily available to stakeholders at the beginning of the growing season apparently sufficed to make accurate predictions of marketable yields and draw nutrient response models.The P and K dosage showed little contribution to bulb yield prediction.The N dosage was the most relevant controllable feature.

Tissue nutrient standards
Features used to run ML classification models and compute tissue nutrient standards comprised cultivars and tissue tests.Random forest, and extreme gradient boosting returned values for area under curve (AUC) and classification accuracy (CA) at 50 Mg ha −1 yield cutoff (Table 2).The AUC and CA (> 90%) were high whether raw concentrations or centered log ratios were used as features, indicating that the ML models handled nutrient interactions efficiently.Table 1.Accuracy of machine learning models to predict bulb yields.a Predictors: cultivar, soil management, cropping system, previous crop, method of crop establishment, nutrient dosage (N-P-K), soil analysis (clay content, organic matter, pH, macro-nutrients, micronutrients), length of the growing season, rainfall, standardized Shannon diversity index, cumulated degree-days, date of crop establishment.www.nature.com/scientificreports/properties is presented in Table 5. Obviously, tissue compositions can be impacted not only by the genetic background of cultivars but also by differential soil properties.

Discussion
Much efforts have been deployed by research groups in southern Brazil to reach growers' application scale by accounting for soil test, organic matter content, clay content and cation exchange capacity 17 .In the present research, we also considered cultivar, soil and crop management, climatic indices, and tissue tests.Machine learning models using features readily available to the stakeholders were found to be accurate.

Nitrogen recommendations
The nitrogen demand by onions was found to depend on bulb yield, cultivar, tissue nutrient levels, soil properties and fertilizer timing and placement, and thus needed to be calibrated locally 10 .Although OMC did not appear as relevant feature in relation to bulb yield as shown by its low RRelieffF score, OMC may impact N fertilizer recommendations.The N fertilization of onions in southern Brazil was adjusted to local conditions by accounting for organic matter content (OMC) (120, 100 and ≤ 80 kg N ha −1 for OMC of 2.5%, 2.5-5% and > 5%, respec- tively) and at a rate of 4 kg N ton −1 for yield expectations exceeding 30 Mg ha −117 .Because OMC was included as feature in the ML model, OMC may impact the response models in future universality tests.While optimum N fertilization may vary locally from 157 to more than 200 kg N ha −128-30 , the N dosage must minimize yield www.nature.com/scientificreports/loss 28 .In Cambisols of Santa Catarina, the best economic yield was reached applying 249 kg N ha −1 in a sandy soil of low organic matter content, and 116-142 kg N ha −1 in clayey soils of medium organic matter content 31 .Boyhan et al. 32 reported that N recommendations for onions at maximum yield in Georgia, USA, were 95-123 kg N ha −1 higher than the recommended N rates of 140-168 kg N ha −1 .In contrast, maximum bulb yield of 52 Mg ha −1 on a Thermic Plinthic Paleudult was reached applying 263 kg N ha −1 , as suggested by a quadratic model.However, yield differences were not significant applying 263 kg N ha −1 or 140 to 168 kg N ha −1 , indicating random variation of onion yields on the plateau and high risk of overfertilization using the quadratic model.Initiating the model close to the observed optimum rate near the yield plateau can avoid that problem of overestimation.Quadratic response models initiated at zero-N depends on the flatness of the slope and may lead to over-fertilization supporting speculative 'insurance' decisions 33 .Controlling the trajectory of the quadratic model using an economic constraint alone, the recommended N rate for 'Optima F1' in Minas Gerais state, Brazil, was found to be 148 kg N ha −134 .
Although the N dosage can vary widely under different growing conditions the number of N trials was limited (25) in the present study compared the 93 and 461 multi-environmental N fertilizer trials to run ML models on potato (Solanum tuberosum) 35 and maize (Zea mays) 36 , respectively.More trials and universality tests should be conducted to validate model outcomes in growers' fields.

Phosphorus and potassium recommendations
The P and K features did not appear to relevant enough to run the ML models.Irrigation and features that improves P and K diffusion in the soil increase nutrient use efficiency in tropical soils 37 .Nevertheless, the number of trials was small for P (5) and K (3) compared to N (25).As a result, more P and K trials should be conducted to support any change in state recommendations 17 .State-based recommendations integrate information from available field trials, local knowledge, and agronomic expertise.
The P dosage is generally high in tropical soils due to high soil P-fixing capacity and the limited root system of onions 38 .The clay content is representative of P fixing capacity and is integrated into the Brazilian P recommendation scheme 17 .The P Mehlich-1 /clay ratio (Mehlich-1 extraction method) could also be used as soil test similar to the [P/(Al + Fe ratio)] Mehlich-3 (Mehlich-3 extraction method) currently used in North America 13,[39][40][41][42] .In a low-P Humic Dystrophic Cambisol (6.9 mg P-Mehlich1 dm −3 and 24% clay), onion responded linearly to P fertilization in the range of 0 to 210 kg P ha −1 at yield levels up to 45 ton ha −128, 38 .In a medium-P dystrophic red-yellow Latosol (9.1 mg P-Mehlich1 dm −3 and 26% clay), onion responded non-linearly to added P up to ≈131 kg P ha −1 at yield levels of 36-40 ton ha −143 .In a low-P dystrophic red-yellow Latosol (23.8 mg P-Mehlich1 dm −3 ), onion responded non-linearly to added P in the range of 27 to 80 kg P ha −1 at yield levels of 75-76 Mg ha −144 .Those results may fit state recommendations 21 if the yield level is considered.The split of P fertilization may improve P-use efficiency, especially in high P-fixing soils 45 .On the other hand, onion P uptake is facilitated by the positive effect of irrigation on the P diffusion process 46 .The P dosage using the efficiency coefficient of fertilizer P alone 18 and disregarding water supply that facilitates P diffusion in the soil could thus lead to overfertilization 37 .Moreover, colonization of onion roots by arbuscular mycorrhiza fungi (AMF) can regulate the P uptake by exploring a larger volume of soil 47 .
The K dosage is most often prescribed to 'feed the soil' depending on the selected maintenance soil test K level and the CEC.In a soil containing 77 mg K-Mehlich1 dm −3 and showing CEC of 7 cmol c dm −3 , onion crops responded non-linearly to added K up to 75 kg K ha −1 at yield level of 66 Mg ha −143 .In a high-K Red-Yellow Argisol showing 97-109 mg K-Mehlich1 dm −3 and CEC of 7 cmol c dm −3 , onion responded non-linearly to added K up to 150 kg K ha −1 to reach yield levels of 46-54 Mg ha −148 .Those results may fit state recommendations 21  While soil K supply capacity also depends on soil mineralogy 49 , the K release from minerals that contributes to plant K uptake requires conducting fertilizer trials 50 .Large discrepancies may thus occur among K recommendation systems.

Tissue diagnosis
In the present study, we suggested ranges of tissue nutrient levels as nutrient standards to conduct nutrientby-nutrient diagnosis.S-É Parent 51 suggested using a concept of reachable hyper-islands or 'hyper-blobs' each representing multivariate combinations of successful conditions compared to those of defective specimens.
Using KNN as machine learning model, compositional proximity was shown as an Euclidean distance between the composition of the diagnosed specimen and that of its successful neighbors 52 .Benchmark blobs were also called 'Enchanting Islands' 53 , 'Humboldtian loci' 54 , and 'Ilhas Encantadas' in Portuguese 55 .This emphasizes the need to diagnose tissue nutrient compositions holistically rather than separately 56,57 .

Need for large and diversified databases
Large and diversified experimental and observational data sets must be acquired by stakeholders to cross-over the numerous combinations of crop-impacting features in onion agroecosystems [57][58][59][60] .Kyveryga et al. 33 stated that the development of new nutrient calibration procedures has been limited by the inability in the past to collect a sufficient number of yield responses to enable calculating reliable economic optimum rates.To follow-up on model predictions, universality tests are needed to verify the reliability of model outcomes in growers' fields 36,61 .
The prediction of N dosage can be conducted as shown in S4 by providing the site-specific feature and drawing a response curve predicted from those features.Such tests require close collaboration with growers to facilitate the acceptance of a site-specific fertilizer program and update the database.Precision farming technologies could allow collecting trustful data at low cost in growers' fields.Efforts to develop technological tools of precision agriculture for site-specific fertilization have been limited by nonspecific state-based fertilizer recommendations.For some high-valued crops like maize, the nitrogen dosage can be adjusted to local factors using ML methods 36 .Observational and experimental data sets could be further combined and processed by machine learning to customize nutrient management for a given set of controllable and uncontrollable features 62 .In this paper, accurate ML learners processed a minimum data set to support wise decisions for the feature-specific N fertilization in onion agroecosystems of southern Brazil.

Conclusions
This paper addressed onion nutrient management at local scale.We assembled the results of fertilizer experiments conducted between 2007 and 2020 in Santa Catarina state, the major onion production region in Brazil.We showed that decision-tree machine learning models can return accurate yield predictions under a set of easy-tocollect features.Key features available to growers before planting or seeding included cultivar, soil management, cropping system, previous crop, fertilization (N-P-K), soil test results (clay content, CEC, organic matter, pH, nutrients) and date of crop establishment.The RReliefF scores revealed that split-N dosage as well as soil test S and micronutrients were the most relevant features to predict onion yield.The accuracy of the regression models reached R 2 > 90% using random forest and extreme gradient boosting.The N dosage was the most relevant controllable feature to run universality tests in growers' fields to assess the ability of ML model to generalize.
The accuracy of the classification models also reached R 2 > 90% using random forest and extreme gradient boosting.The cultivar and tissue nutrients impacted bulb yield, allowing to develop cultivar-specific nutrient standards.Sulfur and micronutrients were the most relevant features to differentiate onion cultivars, indicating cultivar × environment interactions.It is thus advisable to conduct tissue diagnosis considering agroecosystemspecific nutrient standards to reflect cultivar × environment interactions.To set apart genetics and environment, feature-specific cultivar ionomes should be determined in comparable agroecosystems.However, such agroecosystem nutrient standards would require larger and more diversified databases than the one used in this study.

Experimental setup
Fertilizer trials were conducted from 2007 to 2020 in the municipalities of Ituporanga, Atalanta, Lebon Régis and Caçador, Santa Catarina state, Brazil (Fig. 4).The soils of the region are Cambisols, also classified as Nitossolo Bruno Distrophic 63 , and Typic Hapludox 64 .The subtropical climate is mesothermic and humid with mild summers.According to Köppen's classification, the climate is classified as Cfa in Ituporanga and Atalanta, and as Cfb in Lebon Régis and Caçador.

Climatic data
Daily precipitations as well as minimum and maximum daily temperatures were obtained from the EPAGRI 66 meteorological station closest to the trial.Temperature indices were the minimum and maximum seasonal temperatures and the cumulated degree-days with base temperature of 5 °C for cold crops 64 .Rainfall distribution was estimated by the standardized Shannon diversity index (SDI) as follows 65 : where p i is the fraction of daily rainfall (RAIN) to the rainfall cumulated during the growing period (PPT), i.e. the daily RAIN/PPT ratio, and n is the length of the growing season; SDI = 1 implied that rainfall was uniformly distributed during the indicated period (equal daily amount of rainfall over the selected period); SDI = 0 implied that rainfall was unevenly distributed (total rainfall concentrated in 1 d).Where p i = 0 , p i × ln p i = 0 .Crops were sprinkler irrigated.

Experimental setup
There were 26 N trials, five K trials and three P trials, totaling 1182 observations (Supplementary Material S4).
Treatments were arranged as randomized block designs with four replications.In Ituporanga and Atalanta, plots were 4 m long and 3 m wide, and comprised eight rows spaced 35 cm apart.Transplants were spaced 8 cm apart on the row.The population of transplants was approximately 375,000 plants ha −1 .Bulbs were harvested in five internal rows 4-m long.In Caçador and Lebon Régis, plots were 5 m long and 2.7 m wide, and comprised nine rows spaced 30 cm apart.Plants were spaced 5.5 cm on the row.The population of seeded onions was 600 000 plants ha −1 .Bulbs were harvested at leaf in three double line, 5-m long rows, per plot.The bulbs were left on the field for a pre-curing period of one week, then bagged and stored for weighing and sizing.Bulbs were classified as commercial, non-commercial and harvest loss.Marketable bulbs included #2 (< 50 mm), #3 (50-70 mm), #4 (70-90 mm), and #5 (> 90 mm) bulb categories 67 .Bulbs showing secondary growth or damage were classified as non-marketable.

Fertilizer treatments
The N, P and K treatments were applied separately at increasing rates at each experimental site.The N rates varied from 0 to 370 kg N ha −1 split-applied 45, 80, 110, and 130 days after seeding, 20, 30, 30 and 20% of N broadcast-applied, respectively, or 35, 60, and 85 days after transplanting, 30, 40 and 30% of N broadcast-applied, respectively.The P rates ranged from 0 to 349 kg P ha −1 .The K rates varied between 0 and 667 kg K ha −1 , splitapplied together with the N.Where the rates of N, P and K were varied, the rates of the other nutrients were fixed following state recommendations 17 .Fertilizers were in granular form.The sources of N were ammonium nitrate, urea, ammonium sulfate, algae-coated ammonium sulfate (29% N, 5% Ca, 2% Mg, 9% S, and 0.3% B), azoslow (organo-mineral fertilizer containing 20% C and 29% N as urea and hydrolyzed proteins) or poultry manure (pH of 7.8, 15.9% moisture, 3.5% N, 3.1% P, 2.7% K, 37 mg Cu kg −1 , 43 mg Zn kg −1 , 73 mg Mn kg −1 , and 1160 mg Fe kg −1 ).The source of N fertilizer may differ among trials.However, we assumed that differences among mineral N sources were negligible due to the rapid conversion of www.nature.com/scientificreports/ammonium to nitrate in agricultural soils 68 .The P and K treatments were applied as triple superphosphate and potassium chloride 17 .The N and K were split at up to four occasions during the season, i.e., at planting and 35, 60, 85 or 90 d later for transplants, or at planting and 45, 80, 110 or 130 d later for seeded onions 17 .The P was applied entirely at planting.

Soil analysis
Soils were sampled in the 0-20 cm layer 45-60 days before planting across the experimental area, then composited.Soils were dried in a forced-air oven at 65 °C then ground to less than 2 mm.Chemical analyses were conducted as follows 17 : pH in 1:2.5 soil-to-water volumetric ratio, clay by sedimentation, Mehlich-1 extraction for P and K, and EDTA-extraction for cationic micronutrients.Elements were quantified by colorimetry for P and B, flame photometry for K, turbidimetry for S, and atomic absorption spectrophotometry for Ca, Mg, Cu, Fe, Mn, and Zn.Total carbon was quantified by dichromate oxidation (Walkley-Black procedure) then multiplied by 1.724 to derive organic matter content.Base saturation was computed as the sum of cationic species (cmol c kg −1 ) divided by CEC computed as the sum of exchangeable cations and acidity.Exchangeable acidity was assessed as follows 69 :

Tissue analysis
After planting, leaf analysis, based on appropriate sampling methods and correct interpretation of analytical data, is a reliable tool for assessing the nutritional status of perennial plants and their response to fertilizers 69 .Ten young fully expanded leaves were collected in each plot at the beginning of plant differentiation into bulb 17 , i.e. 70 to 75 d after transplanting and 115 to 128 d after sowing, depending on year and cultivar.Tissue samples were composited per plot for chemical analysis.The leaves were cleaned gently under distilled water then dried at 65 ± 5 °C and ground to less than 1 mm.Total N was quantified by micro-Kjeldahl.Tissue samples were digested in a mixture of nitric and perchloric acids then analyzed by colorimetry for P and B, flame photometry for K, turbidimetry for S, and atomic absorption spectrophotometry for Ca, Mg, Cu, Fe, Mn, and Zn 70,71 .

Statistical analysis
Log ratio transformation Concentrations are parts of a compositional vector constrained to the compositional space 68 such as 1000 g kg −1 for tissue tests.The compositional space for cationic species could also be defined as cmol c kg −1 and constrained to CEC.Conducting parametric statistical analyses using raw concentrations produces numerical biases that may lead to sums of components in statistical results that differ from measurement unit (e.g., sums of sand + silt + clay different than 100% after conducting ANOVA).Moreover, ignoring nutrient interactions may decrease the accuracy of nutrient diagnosis using parametric methods 37,55 .
In contrast, clr values are relative expressions allowing compositions to move from the constrained compo- sitional space to the unconstrained real space ( ±∞ ) that is required to run statistical analyses.Nutrient con- centrations are constrained to the measurement unit using a filling value F v computed by difference as follows using a measurement unit in g kg −1 : where D is the number of parts including the filling value, and c i is the concentration of each nutrient and the filling value.The centered log ratio centers any concentration against the geometric mean across parts [ clr = ln(x i /G) ], hence accounting for all pairwise ratios that reflect nutrient interactions and cross-talks 24,69 , as follows for nitrogen (N): Because the clr values are computed about the geometric mean, the sum of clr values is zero.The mean clr value for component i can be back transformed into its concentration value x i as follows: Where exp is the exponential transformation of the centered log ratio and κ is the unit of measurement (e.g., 1000 g kg −1 ) to force closure to the measurement unit (here, g kg −1 ).
Exchangeable acidity = 10exp 7.76 + 1.053×pH SMP , R 2 = 0.98 where clr i is the clr value of component i of the diagnosed composition, and clr * i is the clr value of component i of a close successful compositional neighbor.In Brazil, clr indices are widely used to diagnose the plant nutrient status 72 using clr reference values 73 .Tissue nutrient indices ( I x i ) are differences between diagnosed clr value ( clr x i ) and the clr mean ( clr * x i ) for true negative specimens (TN) weighted by the standard deviation ( SD * x i ), computed as follows 74 : Nutrient indices can be displayed in a histogram to indicate relative excess or shortage of nutrients, respectively.The nutrient standards for high-yielding and nutritionally balanced specimens can be computed regionally (e.g., across the surveyed area), or from a selection of close compositional neighbors.

Machine learning models
Several machine learning (ML) models can be tested using the Orange data mining freeware vs. 3.29.In the ML models, the target variable was marketable bulb yield.Features were climatic indices, nutrient dosage, soil and tissue analyses, cultivar, crop establishment (direct seeding or manual transplanting), soil management, municipality, climatic indices, date of stand establishment, and harvest date (source), as described in Table 6.
Summaries of tissue and soil test results used as features are presented in Table 3 and Supplementary Material S3, respectively.Other features were managerial or climatic.'Empasc 352 Bola Precoce' and 'SCS373 Valessul' are short-day cultivars requiring 11-13 h to initiate bulbification.Median-day cultivars requiring 13-15 h to initiate bulbation were ''Epagri 362 Crioula Alto Vale, 'Mulata' , 'Omega' and 'Caeté' .We discarded 'Bola Precoce' specimens because tissue analysis for sulfur was absent.Onions were seeded or transplanted.Crops were established by direct seeding or were transplanted manually.Stand establishment, soil management and previous crops are reported in Supplementary Material S4.Previous crops were black oat (Avena sativa), millet (Pennisetum glaucum), sweet potato (Ipomoea batatas), tobacco (Nicotiana tabacum), corn (Zea mays), cowpea (Vigna unguiculata (L.) Walp.), velvet bean (Mucuna aterrima) and millet (P.glaucum).Preceding crops varied among years and locations.Climatic conditions varied widely at experimental sites as shown in Supplementary Material S5.The importance of features in relation to bulb yield was measured as RReliefF ranking scores 75 .The RReliefF algorithm computes a difference between actual and predicted values in regression problems based on the nearest neighbor paradigm after considering feature interactions.
Two decision-tree ML regression models were tested among more than 100 variants commonly used in soil science 40,66 , i.e., random forest and extreme gradient boosting, both available in the Orange Data Mining freeware v. 3.39.0programmed in the Python language (University of Ljubljana, Ljubljana, Slovenia).The Python algorithms are encoded into icons and arrows.The scheme of icons and arrows is presented in Supplementary Materials S1 and S2.There were several missing data in the dataset (13%).The dataset was thus rebalanced by model-based imputation using the random forest imputation method 76,77 .
Decision-tree models separate two subsets recursively about cutoff points that minimize the variance of the target variable until a minimum number of instances is reached.Random forest and extreme gradient boosting are structurally different.Random forest is a bagging model that averages predictions made by sampling with replacement.We selected 10 trees per bag at each run.Extreme gradient boosting is a variant of the tree-based ensemble gradient boosting method that combines weak predictive models to minimize prediction error.The extreme gradient boosting creates and adds trees of learners sequentially to correct the weakness of the preceding estimators.We selected 100 trees as basic property.
The partition between the training and testing datasets was conducted by stratified random sampling.The population of data comprised subgroups of categorial variables or strata.Data were randomly sampled within  each strata.This avoids sampling data from the same strata during the partition between the training set and testing sets.Otherwise, complete random sampling leads to model overfitting.The train/test partitions were repeated 100 times, and model accuracy was averaged.The accuracy of the partition between the training and the testing sets reached a plateau at 70:30.Such partition was thus selected to process the data.The regression ML model returns a relationship between the actual and the predicted starget variable.Model accuracy is reported as root mean squared error (RMSE), median absolute error (MAE), and coefficient of determination or R 2 .Model strength is substantial if R 2 is > 75% 78 .The classification mode returns a confusion matrix where specimens are classified into four quadrants: true negative (yield above cutoff, nutritionally balanced composition), false negative (yield below cutoff, nutritionally balanced composition), false positive (yield above cutoff, nutritionally imbalanced composition) and true positive (yield below cutoff, nutritionally imbalanced composition).True negative specimens provided a set of successful features to compute tissue nutrient standards amongst others.The accuracy of the classification model is measured by the area under curve and the classification accuracy.

Figure 1 .
Figure 1.RReliefF scores indicating the most relevant features in relation to bulb yield.

Figure 4 .
Figure 4. Geographic location of the Caçador, Lebon Régis, Atalanta and Ituporanga municipalities in Santa Catarina State, southern Brazil, where onion fertilizer trials were conducted.Map created by QGis software 65 version 3.34.3.

Table 2 .
Area under curve (AUC) and classification accuracy (CA) for machine learning models at yield cutoff of 50 Mg ha −1 using raw concentrations or clr values as features.N, P, K, Ca, Mg, Fe, Mn, Zn, Cu, B, S.Learner Nutrient

concentration values Centered log-ratio-transformed nutrient concentration values Area under curve Accuracy Area under curve Accuracy
Figure 3. Information gain from cultivar and raw nutrient concentrations impacting onion yields.Vol:.(1234567890)

Table 3 .
Comparison between tissue compositions of nutritionally balanced cultivars at high yield level (> 50 ton ha −1 ).Values in parentheses are confidence intervals (C.I.) about centered log ratio ( clr) means at P = 0.01.Centroids are means of clr values that have been back-transformed to familiar raw concentration values.† Filling value calculated by the difference between the geometric means and the initial filling value.

Table 4 .
Lower quartile (LQ) and higher quartile (HQ) of nutrient concentrations for nutritionally balanced onion cultivars producing more than 50 ton ha −1 .

Table 5 .
if Soil test median values at experimental sites for four cultivars at high yield level (> 50 Mg ha −1 ).the yield level is considered.In a work carried out in Santa Catarina state with cultivar Empasc 352 Bola Precoce, 86.5 kg K ha −1 was taken up by the onion crop at yield level of 37 Mg ha −1 , accumulating 2.3 kg of K per Mg 47 .